Generalized Biwords for Bitext Compression and Translation Spotting

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized Biwords for Bitext Compression and Translation Spotting

Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords —pairs of parallel words with a high probability of cooccurrence— that can be used as an intermediat...

متن کامل

Generalized Biwords for Bitext Compression and Translation Spotting: Extended Abstract

The increasing availability of large collections of bilingual parallel corpora has fostered the development of naturallanguage processing applications that address bilingual tasks, such as corpus-based machine translation, the automatic extraction of bilingual lexicons, and translation spotting [Simard, 2003]. A bilingual parallel corpus, or bitext, is a textual collection that contains pairs o...

متن کامل

Boosting Bitext Compression

Bilingual parallel corpora, also know as bitexts, convey the same information in two different languages. This implies that when modelling bitexts one can take advantage of the fact that there exists a relation between both texts; the text alignment task allow to establish such relationship. In this paper we propose different approaches that use words and biwords (pairs made of two words, each ...

متن کامل

Translation Spotting for Translation Memories

The term translation spotting (TS) refers to the task of identifying the target-language (TL) words that correspond to a given set of sourcelanguage (SL) words in a pair of text segments known to be mutual translations. This article examines this task within the context of a sub-sentential translation-memory system, i.e. a translation support tool capable of proposing translations for portions ...

متن کامل

Bitext Alignment for Statistical Machine Translation

Bitext alignment is the task of finding translation equivalence between documents in two languages, collections of which are commonly known as bitext. This dissertation addresses the problems of statistical alignment at various granularities from sentence to word with the goal of creating Statistical Machine Translation (SMT) systems. SMT systems are statistical pattern processors based on para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Artificial Intelligence Research

سال: 2012

ISSN: 1076-9757

DOI: 10.1613/jair.3500